Introduction

Who am I?

Mr. (almost Dr.!) Richard E.W. Berl

(but “Ricky” is fine)

I am an evolutionary social (data) scientist with a background in behavior and cultural change and a passion for conserving biocultural diversity and improving social good and environmental sustainability.


B.A. Biological Sciences & B.A. Anthropology from University of Delaware (2009)



Field Assistant, Lomas Barbudal Monkey Project (2009-2010)


  • Social learning and behavioral traditions




M.S. Zoology from Washington State University (2015)

  • Social behavior and learning in captive gray wolves (Canis lupus) at Wolf Park in Battle Ground, IN





  • Cultural and genetic variation of the Chabu hunter-gatherers of Southwestern Ethiopia

     

    • Gopalan, S., Berl, R. E. W., Belbin, G., Gignoux, C., Feldman, M. W., Hewlett, B. S., & Henn, B. M. (2019). Hunter-gatherer genomes reveal diverse demographic trajectories following the rise of farming in East Africa [preprint]. bioRxiv, 517730. Available: https://www.biorxiv.org/node/152746.abstract

      Global ancestry proportions of northeast African individuals.

       

      Effective migration surfaces depicted as contour lines over A) satellite imagery, B) elevation and water features, and C) the geographic distribution of major language families in Eastern Africa.



Ph.D. Human Dimensions of Natural Resources from Colorado State University (2019)

  • Ph.D. Candidate in Human Dimensions of Natural Resources (defending on May 15th!)

  • Graduate Certificate in Applied Statistics

  • Influence of prestige in determining what people learn and from whom they choose to learn

     

    Prestige domain item loadings from exploratory factor analysis of attitudinal data.

     

    Determinants of prestige by level of social stratification across 16 societies.

     

    Mean proportion of propositions recalled from artificial creation stories by type of content bias and by speaker prestige.

     

    Color matrices of propositions recalled from artificial creation stories.


  • Volunteer data scientist for Trees, Water & People

     

    Random forest prediction of Pinus ponderosa var. scopulorum habitat suitability under present conditions on Pine Ridge Reservation and Trust Land.

     

    Correlation matrix heatmap of climatic and soil variables.

     

    Logistic regression of Pinus ponderosa var. scopulorum occurrence on burn area.



What we will cover in this course

  • See the Syllabus and Course Schedule

  • Objectives (from Syllabus)

    • Set up a convenient computing workflow

    • Write clean, thoroughly commented R code

    • Recognize different types of data, how they are measured, and how they are handled in R

    • Use the principle of ‘tidy data’ to effectively clean and format messy data sets

    • Creatively explore data sets with descriptive statistics and rough visualizations prior to confirmatory analyses

    • Clearly communicate results by visualizing data simply and effectively and by telling a compelling story with data

    • Conduct basic statistical tests and linear regression modeling

    • Explore advanced topics in data analysis, including dimensionality reduction and structural equation modeling

    • Utilize R for your own research by developing a research question, collecting and wrangling data, and conducting the appropriate analyses

    • Support reproducible research by documenting and embedding analyses in a written report

    • Use the skills you have learned to communicate your process and results to a general audience


What we will not cover

  • R Markdown (kind of) and R Notebook

  • LaTeX

  • Version control
    • Git / GitHub

  • Tibbles (tibble package) and piping (magrittr package)


Setting up a computing workflow

Create a folder structure for this course (see recommendations in required reading by FitzJohn):

nr592/
├── data/
├── docs/
├── figs/
├── output/
└┬─ R/
 ├─ assignment1.R
 └─ lecture01.R

Create a project for this course:

nr592/
├── data/
├── docs/
├── figs/
├── output/
├── R/
└── nr592.Rproj


Basic concepts in R

DON’T BE AFRAID TO FAIL!

Run current line/selection of code:

  • Ctrl+Enter (Windows)
  • Command+Enter (Mac)

Source: RStudio Keyboard Shortcuts

Objects

Variables

Assignment

a = 1
b = 2
c = 42
a
## [1] 1
b
## [1] 2
c
## [1] 42

Operations

a + b
## [1] 3
a^2 + b^2
## [1] 5
c / (a + b)
## [1] 14

Reassignment

a
## [1] 1
b
## [1] 2
a = a + b
a
## [1] 3
a = a + b
a
## [1] 5
a = a + b
a
## [1] 7

Scripts

Always work in scripts!

Open a new R script:

  • Ctrl+Shift+N (Windows)
  • Command+Shift+N (Mac)

Commenting

# "This line is commented out."
"This line is not commented out."
## [1] "This line is not commented out."

Comment/uncomment line:

  • Ctrl+Shift+C (Windows)
  • Command+Shift+C (Mac)

An example of well-commented code:

# Load iris data
data(iris)

# View structure of iris data
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# Subset iris data to Species versicolor
irisVe = subset(x=iris, subset=Species == "versicolor")

# Find correlation between sepal length and petal length in versicolor
cor(x=irisVe$Sepal.Length, y=irisVe$Petal.Length)  # Result: 0.754049
## [1] 0.754049

Packages

Projects

Working directory
Relative paths

Data types and classes

Numeric

Continuous (floating point)
- Real
- Complex
Discrete
- Binary (logical/boolean)
- Integer

Ordinal

Ordered factor

Categorical

String (character) Factor

Date

lubridate

Data structures

Vectors

Can only have one type

Matrices

Data Frames

Lists

Functions

a
## [1] 7
b
## [1] 2
c
## [1] 42
mean(x=c(a, b, c))
## [1] 17

Let’s make our own function to add the variable a to the variable b.

a_plus_b = function(a, b){
  aPlusB = a + b
  return(aPlusB)
}

a_plus_b(a=1, b=5)
## [1] 6

What if we just use aPlusB instead of the whole function?

aPlusB(a=1, b=5)
## Error in aPlusB(a = 1, b = 5): could not find function "aPlusB"

Why doesn’t this work?

aPlusB is a variable, not a function. You can’t pass other variables to it.

Let’s see what value is stored for the aPlusB variable.

aPlusB
## Error in eval(expr, envir, enclos): object 'aPlusB' not found

Why doesn’t this work?

aPlusB only exists inside the a_plus_b function. It doesn’t have a value assigned to it in the “global environment” that we’re working in. (Look in the “Environment” tab in RStudio.) aPlusB is called an “internal variable” because it only assigned inside the function when it’s called, and is removed as soon as the function is finished running.

Operations

Common Functions

str() names() class() dim() length() nrow() ncol() gc() ? View()

Subsetting

Conditionals

Iteration

Loops

Apply



(pdf / Rmd)